-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor c2st and adding a unit test for metrics #503
Conversation
Codecov Report
@@ Coverage Diff @@
## main #503 +/- ##
==========================================
- Coverage 67.96% 66.94% -1.02%
==========================================
Files 56 67 +11
Lines 4117 4396 +279
==========================================
+ Hits 2798 2943 +145
- Misses 1319 1453 +134
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
It would be cool, if anyone could have a look at this PR. In the |
Thanks for the ping @psteinb ! I think this is very valuable input you give 👍 , but I would need to look into it a bit, which could happen next week the earliest. Also, I think it would make sense to @jan-matthis input on this topic. |
sure thing, I just found some time today to invest in this. The difference in runtime for the test cases I could conceive (and there are likely more) is striking. Looking forward to your feedback! |
I merged main into this branch, so that the CI passes. |
Just wanting to pick this up here. @janfb and @jan-matthis should I update this PR and bring the randomforest based |
As suggested by @janfb, I ran a small analysis on a synthetic dataset with Bottom line for me:
|
@janfb or anyone else ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the effort, this is great! Two major points:
- Is there a reason to change the default classifier?
- Overall, these new tests should not exceed five minutes (3 minutes would be even better) (otherwise our total test time will explode in the long run). Could you check this and remove tests if necessary?
Aside: all tests are failing atm because of a missing import
Depends on the input data.
Fair enough, I'll look into this. Based on my analysis here using a smaller dataset size should be fine. Those 3 minutes apply to all non-
I'll see to it. |
Yes, making it configurable is great. But I would stick to the MLP as default. 3-5 minutes applies to all tests, including slow tests. (sorry about being picky here, but I really actively trying to keep test time managable) Thanks a lot! |
thanks, this is great! I think the speed up we gain from using random forest vs MLP is a very strong argument for changing the defaults, @michaeldeistler . You do not? What is your argument for keeping the |
- reducing the sample count based on studies in github.com/psteinb/c2st - expanding the bounds for assert statements to comply to uncertainties obtained there - renaming tests to better match their scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall great, thanks a lot! Some things need to be changed as indicated in the comments, and I think the tests can be reduced a lot with a bit of refactoring.
- 2 functions available: c2st and c2st_scores - use early stopping for MLP implementation - c2st is the easier interface + allows changing the classifier using a string argument + returns the mean score from cross validation - c2st_scores is more versatile + can consume sklearn compatible classifier class + returns all scores from the cross validation
Yes I'm perfectly fine with swapping out the classifier default. But please run all tests (including slow) and make sure that they pass. Thanks! |
Thanks for the feedback. I introduced early stopping for the MLP classifier and am currently rerunning my mini-study. It does reduce the MLP runtime considerably iff the two ensembles are (far) apart from each other and are easy to classify. If that is not the case, the runtime is still rather long. All tests pass on my machine and in the CI. Runtimes look good to me:
I rerequested the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good, just suggested some refactoring, especially for the tests.
I think it would be great to simplify the tests, and to simplify Then I would also let @michaeldeistler look at this again, and finally I think it would be great if you could do an interactive rebase and group these rather many commits to obtain just a couple of commits. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- refactored unit tests into parametrized tests - all tests below 5s runtime, one pytest call takes below 45s on a 4c laptop - refactored c2st to offer simplistic interface - refactored c2st_scores to offer more control - renamed parameter scoring to metric to prevent confusion with z_score - removed elif statement - added type hints
Thanks for improving the tests! Great effort 🚀 |
You want me to squash those commits in this PR and rebase onto main? I am totally happy if you want to just squash-merge. Just let me know. |
No, it's like it is if we squash-merge. only if we were to rebase and merge then I would prefer some rewriting of the commits. |
Horray! 🕺 |
I wanted to understand the ins and outs of
c2st
better. I am putting this here. potentially my code can help to improve it. Also,c2st
is untested currently.